Forming an Integrated Lexical Resource for Word Sense Disambiguation
نویسنده
چکیده
This paper reports a full-scale linkage of noun senses between two existing lexical resources, namely WordNet and Roget's Thesaurus, to form an Integrated Lexical Resource (ILR) for use in natural language processing (NLP). The linkage was founded on a structurally-based sense-mapping algorithm. About 18,000 nouns with over 30,000 senses were mapped. Although exhaustive verification is impractical, we show that it is reasonable to expect some 70-80% accuracy of the resultant mappings. More importantly, the ILR, which contains enriched lexical information, is readily usable in many NLP tasks. We shall explore some practical use of the ILR in word sense disambiguation (WSD), as WSD notably requires a wide range of lexical information.
منابع مشابه
Using Relevant Domains Resource for Word Sense Disambiguation
This paper presents a new method for Word Sense Disambiguation based on the WordNet Domains lexical resource [4]. The underlaying working hypothesis is that domain labels, such as ARCHITECTURE, SPORT and MEDICINE provide a natural way to establish semantic relations between word senses, that can be used during the disambiguation process. This resource has already been used on Word Sense Disambi...
متن کاملGenerating Training Data for Semantic Role Labeling based on Label Transfer from Linked Lexical Resources
We present a new approach for generating role-labeled training data using Linked Lexical Resources, i.e., integrated lexical resources that combine several resources (e.g., WordNet, FrameNet, Wiktionary) by linking them on the sense or on the role level. Unlike resource-based supervision in relation extraction, we focus on complex linguistic annotations, more specifically FrameNet senses and ro...
متن کاملAnnotating WordNet
High-quality lexical resources are needed to both train and evaluate Word Sense Disambiguation (WSD) systems. The problem of ambiguity persists even in limited domains, thus the necessity for wide-coverage inventories of senses (dictionaries) and corpora sense-tagged to them. WordNet has been used extensively for WSD, for both its broad coverage and its large network of semantic relations. In t...
متن کاملCASSAurus: A Resource of Simpler Spanish Synonyms
In this work we introduce and describe a language resource composed of lists of simpler synonyms for Spanish. The synonyms are divided in different senses taken from the Spanish OpenThesaurus, where context disambiguation was performed by using statistical information from the Web and Google Books Ngrams. This resource is freely available online and can be used for different NLP tasks such as l...
متن کاملUncertainty in data integration systems: automatic generation of probabilistic relationships
We propose a method for the automatic discovery of probabilistic relationships in the environment of data integration systems. Dynamic data integration systems extend the architecture of current data integration systems by modeling uncertainty at their core. Our method is a probabilistic word sense disambiguation (PWSD), which allows to automatically lexically annotate (i.e. annotation w.r.t. a...
متن کامل